Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add proteus module for maxquant data analysis #147

Merged
merged 38 commits into from
Oct 10, 2023
Merged

Conversation

WackerO
Copy link
Collaborator

@WackerO WackerO commented Jul 13, 2023

This PR adds the proteus module to differentialabundance; it allows importing proteomics measurements from MaxQuant which can then be analyzed with the limma module

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented Jul 18, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit f26a224

+| ✅ 159 tests passed       |+
!| ❗   1 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in WorkflowDifferentialabundance.groovy: Optionally add in-text citation tools to this list.

✅ Tests passed:

Run details

  • nf-core/tools version 2.10
  • Run at 2023-10-09 13:46:58

@WackerO WackerO marked this pull request as ready for review July 20, 2023 11:56
@WackerO WackerO marked this pull request as draft July 20, 2023 13:43
@WackerO WackerO marked this pull request as ready for review August 24, 2023 12:57
Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of initial comments while I look at the rest.

features_log2_assays <- ""
}

if (is.null(features_log2_assays)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated code from shinyngs, we should factor that out. I'll make a PR for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WackerO the function for this is now available in the latest shinyngs, you just have to update the container for the notebook.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should use the factored-out conditional logging function here, so I've unresolved this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course, just had to update shinyngs for the rmarkdown report for that function to be found. I have for now added the [] removal in the report; maybe in a future release that part can be integrated into shiny as well

workflows/differentialabundance.nf Outdated Show resolved Hide resolved
Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a bit of trouble in this, in that you're duplicating a lot of code from the wider pipeline. We should be able to process conditionally in certain places, e.g. in how the matrices are generated which differential method is called, but basically follow the same structure.

e.g. CUSTOM_MATRIXFILTER should only really need to be called in one place. It should be possible to validate everything the same way, since we need to check the same things- i.e. that samples, matrices and metadata are compatible.

We should be able to handle this similarly to arrays. I'll take a look and see if I can make some more detailed suggestions, but maybe that is enough to start with.

features_log2_assays <- ""
}

if (is.null(features_log2_assays)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WackerO the function for this is now available in the latest shinyngs, you just have to update the container for the notebook.

workflows/differentialabundance.nf Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Outdated Show resolved Hide resolved
Comment on lines 181 to 192
ch_contrasts_split = ch_contrasts_file
.splitCsv ( header:true, sep:(params.contrasts.endsWith('tsv') ? '\t' : ','))
.map{ it.tail().first() }

// For proteus, extract only meta and contrast variable
ch_contrasts_proteus = ch_contrasts_split
.map{
tuple(
exp_meta, // meta map
it.variable // contrast variable
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_contrasts_split = ch_contrasts_file
.splitCsv ( header:true, sep:(params.contrasts.endsWith('tsv') ? '\t' : ','))
.map{ it.tail().first() }
// For proteus, extract only meta and contrast variable
ch_contrasts_proteus = ch_contrasts_split
.map{
tuple(
exp_meta, // meta map
it.variable // contrast variable
)
}
ch_contrasts_split = ch_contrasts_file
.splitCsv ( header:true, sep:(params.contrasts.endsWith('tsv') ? '\t' : ','))
.map{ it.tail().first() }
// For proteus, extract only meta and contrast variable
ch_contrasts_proteus = ch_contrasts_split
.map{
tuple(
exp_meta, // meta map
it.variable // contrast variable
)
}

So, proteus will run once for every contrast variable? How does the contrast variable impact on the quantifications?

If it does, why does the proteus module return results keyed by the meta, rather than the meta2 corresponding to the contrast variable?

I think we can simplify this chunk a lot, but I need to understand what's going on better first.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooop, good point, I indeed had to change that code part. Proteus runs once for each contrast var to generate the plots. To the best of my knowledge, the quant tables are not different between contrasts (but just to make sure, I opened an issue and asked the developers). For that reason, I added a reduce to run the rest of the pipeline only for one of the tables.

I also changed the input channels to proteus (they were unnecessarily complicated as I realized) and the meta accordingly after adding a second contrast to the test dataset. I had a look at the output and looks good. The report is attached just in case you are curious, but it's nothing spectacular.

Btw, I found that apparently, RMD parameters that contain capital letters will cause a weird bug during knitting; I had to rename some params. Just as a hint for future coding.

PXD043349.html.zip

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there- just a few more simplifications to facilitate future maintenance of these changes.

features_log2_assays <- ""
}

if (is.null(features_log2_assays)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should use the factored-out conditional logging function here, so I've unresolved this.

conf/test_maxquant.config Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Outdated Show resolved Hide resolved
} else if (params.study_type == 'maxquant'){

// For maxquant, we will use the processed matrices from PROTEUS
ch_features = ch_in_norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to what happens in the final part of this conditional, so let's combine the two rather than duplicating what happens there. Right now the last bit is

        // Otherwise we can just use the matrix input
        matrix_as_anno_filename = "matrix_as_anno.${matrix_file.getExtension()}"
        matrix_file.copyTo(matrix_as_anno_filename)
        ch_features = Channel.of([ exp_meta, file(matrix_as_anno_filename)])

But you could do something like:

        // Otherwise we can just use the matrix input

        matrix_as_anno_filename = "matrix_as_anno.${matrix_file.getExtension()}"
        if (params.study_type == 'maxquant'){
            ch_features_matrix = ch_in_norm
        } else{
            ch_features_matrix = ch_in_raw
        }
        ch_features = ch_features_matrix
            .map{ exp_meta, matrix_file -> 
                matrix_file.copyTo(matrix_as_anno_filename)
                return [exp_meta, file(matrix_as_anno_filename)]
            }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that the inside of the map is not separate from the rest of the code. I had to rename exp_meta, matrix_file and return [exp_meta to not receive this error:
- cause: The current scope already contains a variable of the name exp_meta

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, my bad

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah and I also added the workdir to matrix_as_anno_filename = "${workflow.workDir}/matrix_as_anno.${matrix_file.getExtension()}", otherwise the matrix_as_anno is just created wherever the user happens to be while running the pipelines

workflows/differentialabundance.nf Outdated Show resolved Hide resolved
workflows/differentialabundance.nf Outdated Show resolved Hide resolved
Comment on lines 200 to 205
ch_in_raw = PROTEUS.out.raw_tab
.reduce{a, b -> a}
.map{tuple('id': exp_meta.id, it[1])}
ch_in_norm = PROTEUS.out.norm_tab
.reduce{a, b -> a}
.map{tuple('id': exp_meta.id, it[1])}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_in_raw = PROTEUS.out.raw_tab
.reduce{a, b -> a}
.map{tuple('id': exp_meta.id, it[1])}
ch_in_norm = PROTEUS.out.norm_tab
.reduce{a, b -> a}
.map{tuple('id': exp_meta.id, it[1])}
ch_in_raw = PROTEUS.out.raw_tab
.first()
.map{ meta, matrix -> tuple(exp_meta, matrix) }
ch_in_norm = PROTEUS.out.norm_tab
.first()
.map{ meta, matrix -> tuple(exp_meta, matrix) }

I think this is much simpler to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oop, I did not realize first() has the same effect as my reduce() snippet. Thanks!

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, think we're good, much better integrated now, thanks for the work.

@WackerO
Copy link
Collaborator Author

WackerO commented Oct 10, 2023

Thanks for the review!

@WackerO WackerO merged commit c9d5328 into nf-core:dev Oct 10, 2023
@WackerO WackerO deleted the add_proteus branch October 10, 2023 05:43
@pinin4fjords pinin4fjords added this to the 1.3.0 milestone Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants